mmg_233_2013_genetics_genomicswikiaorg-20200214-history
UniProt
UniProt, also known as The Universal Protein Resource, is a database of protein sequence and its functionality. The database is easy to navigate, and has free, high-quality information. It contains sequences from many reputable sources as well as a lot of information about the biological function of the proteins and their interactions. History of UniProt The main idea behind the creation of UniProt was to create one location to be able to search if a protein has already been discovered, and if so, what exactly is know about it. EBi and SIB together with PIR created UniProt. EBI and SIB worked together to make the databases Swiss-Prot and TrEMBL, while PIR made the database PIR-PSD. These coexisted with different information about proteins. In December of 2003 the teams worked together and created one entity known as UniProt, which was designed to contain all of the information known about protein sequences and their function. The Core Databases UniProt contain four core databases within UniProt. The four include: UniProtKb, UniParc, UniMes, UniRef. These four all contain different information and are used for different jobs. UniProtKb - UniProtKb stands for UniProt Knowledgebase. This database contains a protein knowledgebase of many different sequences. It is also divided into two different subsections. The TrEMBL section has many more sequences but is not expertly reviewed. As of February 2013, there were 29,769,971 sequence entries, which include 9,585,856,378 amino acids. The Swiss-Prot section inludes expertly reviewed sequences, so there aren't quite as many. As of February 2013, there were 539,165 protein sequences, which include 191,456,931 amino acids. (1) UniParc '''- UniParc stands for UniProt Archive. This is a database that includes all sequences from all of the main public protein databases. This is a pool of all protein sequences. UniParc takes from all of the sources and recognizes duplicates to remove redundancy. This creates a database that includes all protein sequences once. When the source database is changed, UniParc recognizes the change and also stored the change in its history. Here are some of the databases that UniParc sources out to: *INSDC EMBL-Bank/DDBJ/GenBank nucleotide sequence databases *Ensembl *European Patent Office (EPO) *FlyBase *H-Invitational Database (H-Inv) *International Protein Index (IPI) *Japan Patent Office (JPO) *Protein Information Resource (PIR-PSD) *Protein Data Bank (PDB) *Protein Research Foundation (PRF) [1] *RefSeq *Saccharomyces Genome Database (SGD) *The Arabidopsis Information Resource (TAIR) *TROME [2] *US Patent Office (USPTO) *UniProtKB/Swiss-Prot, UniProtKB/Swiss-Prot protein isoforms, UniProtKB/TrEMBL *Vertebrate and Genome Annotation Database (VEGA) *WormBase This list is from wikipedia . '''UniMes - UniMes stands for UniProt Metagenomics and Environmental Sequence. This database is about the sequences that deal with areas that include unidentified species from the environment. It was created to have a sub-section specifically used for metagenomic and environmental data. The information in UniMes is not part of UniRef or UniProtKb but is still included in UniParc. UniRef - UniRef stands for UniProt Reference Clusters. This database includes information from UniProtKb and UniParc. It contains three different databases that are based out the percent the sequence is clustered. There is one database for 100% clustered sequences. This narrows down the amount of information in the database allowing people to find what they are looking or easier. Then the other two clusters are at least 90% clustered and at least 50% clustered. How to use UniProt To use UniProt, the desired topic is entered into the query box. The search in box allows you to change which of the core databases your query will go through. When an item is searched the resulting protein that are associated with that topic all show up. Some of the information included is the recommended name as well as the name given to the protein as of now. It also shows you the organism it is in and the taxonomic lineage, as well as the entire sequence and the known and possible functions of that protein. Another way to search for a protein through UniProt is to use the BLAST tab and just enter in the sequence into the search box. This will bring up proteins that are close matches to the sequence you've entered. References (1)UniProt." Wikipedia. Wikimedia Foundation, 12 June 2013. Web. 12 Dec. 2013. (2)"UniProt." UniProt. N.p., n.d. Web. 12 Dec. 2013.